Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Building Parallel Corpora by Automatic Title Alignment

Identifieur interne : 001932 ( Main/Exploration ); précédent : 001931; suivant : 001933

Building Parallel Corpora by Automatic Title Alignment

Auteurs : C. Yang [Hong Kong] ; Kar Wing Li [Hong Kong]

Source :

RBID : ISTEX:8BF74046B164813ED2C183705AC3CC5E9BA4E459

Abstract

Abstract: Cross-lingual semantic interoperability has drawn significant research attention recently, as the number of digital libraries in non-English languages has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish and French, has been widely explored, but CLIR across European and Oriental languages is still at the initial stages. To cross the language boundary, a corpus-based approach shows promise of overcoming the limitations of knowledge-based and controlled vocabulary approaches. However, collecting parallel corpora between European and Oriental languages is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches, and compare their performance in aligning English and Chinese titles of parallel documents available on the Web.

Url:
DOI: 10.1007/3-540-36227-4_38


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Building Parallel Corpora by Automatic Title Alignment</title>
<author>
<name sortKey="Yang, C" sort="Yang, C" uniqKey="Yang C" first="C." last="Yang">C. Yang</name>
</author>
<author>
<name sortKey="Wing Li, Kar" sort="Wing Li, Kar" uniqKey="Wing Li K" first="Kar" last="Wing Li">Kar Wing Li</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8BF74046B164813ED2C183705AC3CC5E9BA4E459</idno>
<date when="2002" year="2002">2002</date>
<idno type="doi">10.1007/3-540-36227-4_38</idno>
<idno type="url">https://api.istex.fr/document/8BF74046B164813ED2C183705AC3CC5E9BA4E459/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001704</idno>
<idno type="wicri:Area/Istex/Curation">001609</idno>
<idno type="wicri:Area/Istex/Checkpoint">001046</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Yang C:building:parallel:corpora</idno>
<idno type="wicri:Area/Main/Merge">001A12</idno>
<idno type="wicri:Area/Main/Curation">001932</idno>
<idno type="wicri:Area/Main/Exploration">001932</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Building Parallel Corpora by Automatic Title Alignment</title>
<author>
<name sortKey="Yang, C" sort="Yang, C" uniqKey="Yang C" first="C." last="Yang">C. Yang</name>
<affiliation wicri:level="4">
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong</wicri:regionArea>
<placeName>
<settlement type="city">Sha Tin</settlement>
</placeName>
<orgName type="university">Université chinoise de Hong Kong</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Hong Kong</country>
</affiliation>
</author>
<author>
<name sortKey="Wing Li, Kar" sort="Wing Li, Kar" uniqKey="Wing Li K" first="Kar" last="Wing Li">Kar Wing Li</name>
<affiliation wicri:level="4">
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong</wicri:regionArea>
<placeName>
<settlement type="city">Sha Tin</settlement>
</placeName>
<orgName type="university">Université chinoise de Hong Kong</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2002</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">8BF74046B164813ED2C183705AC3CC5E9BA4E459</idno>
<idno type="DOI">10.1007/3-540-36227-4_38</idno>
<idno type="ChapterID">38</idno>
<idno type="ChapterID">Chap38</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Cross-lingual semantic interoperability has drawn significant research attention recently, as the number of digital libraries in non-English languages has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish and French, has been widely explored, but CLIR across European and Oriental languages is still at the initial stages. To cross the language boundary, a corpus-based approach shows promise of overcoming the limitations of knowledge-based and controlled vocabulary approaches. However, collecting parallel corpora between European and Oriental languages is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches, and compare their performance in aligning English and Chinese titles of parallel documents available on the Web.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Hong Kong</li>
</country>
<settlement>
<li>Sha Tin</li>
</settlement>
<orgName>
<li>Université chinoise de Hong Kong</li>
</orgName>
</list>
<tree>
<country name="Hong Kong">
<noRegion>
<name sortKey="Yang, C" sort="Yang, C" uniqKey="Yang C" first="C." last="Yang">C. Yang</name>
</noRegion>
<name sortKey="Wing Li, Kar" sort="Wing Li, Kar" uniqKey="Wing Li K" first="Kar" last="Wing Li">Kar Wing Li</name>
<name sortKey="Yang, C" sort="Yang, C" uniqKey="Yang C" first="C." last="Yang">C. Yang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001932 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001932 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:8BF74046B164813ED2C183705AC3CC5E9BA4E459
   |texte=   Building Parallel Corpora by Automatic Title Alignment
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024